Mitigating the Curse of Dimensionality for Exact kNN Retrieval
نویسندگان
چکیده
Efficient data indexing and exact k-nearest-neighbor (kNN) retrieval are still challenging tasks in high-dimensional spaces. This work highlights the difficulties of indexing in high-dimensional and tightly-clustered dataspaces by exploring several important tunable parameters for optimizing kNN query performance using the iDistance and iDStar algorithms. We experiment on real and synthetic datasets of varying size, cluster density, and dimensionality, and compare performance primarily through filter-and-refine efficiency and execution time. Results show great variability over parameter values and provide new insights and justifications in support of prior best-use practices. Local segmentation with iDStar consistently outperforms iDistance in any clustered space below 256 dimensions, setting a new benchmark for efficient and exact kNN retrieval in highdimensional spaces. We propose several directions of future work to further increase performance in high-dimensional real-world settings.
منابع مشابه
Minimizing the Number of Keypoint Matching Queries for Object Retrieval
To increase the efficiency of interest-point based object retrieval, researchers have put remarkable research efforts into improving the efficiency of kNN-based feature matching, pursuing to match thousands of features against a database within fractions of a second. However, due to the high-dimensional nature of image features that reduces the effectivity of index structures (curse of dimensio...
متن کاملThe Role of Hubs in Cross-Lingual Supervised Document Retrieval
Information retrieval in multi-lingual document repositories is of high importance in modern text mining applications. Analyzing textual data is, however, not without associated difficulties. Regardless of the particular choice of feature representation, textual data is high-dimensional in its nature and all inference is bound to be somewhat affected by the well known curse of dimensionality. I...
متن کاملAn Appraise of KNN to the Perfection
K-Nearest Neighbor (KNN) is highly efficient classification algorithm due to its key features like: very easy to use, requires low training time, robust to noisy training data, easy to implement. However, it also has some shortcomings like high computational complexity, large memory requirement for large training datasets, curse of dimensionality and equal weights given to all attributes. Many ...
متن کاملLSI vs. Wordnet Ontology in Dimension Reduction for Information Retrieval
In the area of information retrieval, the dimension of document vectors plays an important role. Firstly, with higher dimensions index structures suffer the “curse of dimensionality” and their efficiency rapidly decreases. Secondly, we may not use exact words when looking for a document, thus we miss some relevant documents. LSI (Latent Semantic Indexing) is a numerical method, which discovers ...
متن کاملMinimizing the Number of Matching Queries for Object Retrieval
To increase the computational efficiency of interestpoint based object retrieval, researchers have put remarkable research efforts into improving the efficiency of kNN-based feature matching, pursuing to match thousands of features against a database within fractions of a second. However, due to the highdimensional nature of image features that reduces the effectivity of index structures (curse...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014